Parliamentary Amendments
A parliamentary amendment is a proposal made by a member of parliament or a legislator to specify provisions in a bill or legislation that is being considered by the parliament. This term is commonly used in the legislative processes of various countries, including Brazil, often to earmark funds for specific projects.
This chapter presents a basic (and incomplete!) example of a data science approach to analysis of a very simple parliamentary amendment database. The data was obtained from a government itself.
The examples in this chapter show how even a simple database can be explored and used to give ideas for more interesting questions that may require more data.
The Data
We can download a list of parliamentary amendments as a CSV file from the Federal Government Transparency Open Data Portal (Dados Abertos | Portal da Transparência do Governo Federal). A data dictionary, in Portuguese, can be found at Dicionário de Dados - Emendas Parlamentares. An online, searchable version of the data can be explored at Consulta de Emendas Parlamentares | Portal da Transparência do Governo Federal.
The first lines of that file are shown below:
"Código da Emenda";"Ano da Emenda";"Tipo de Emenda";"Código do Autor da Emenda";"Nome do Autor da Emenda";"Número da emenda";"Localidade do gasto";"Código Função";"Nome Função";"Código Subfunção";"Nome Subfunção";"Valor Empenhado";"Valor Liquidado";"Valor Pago";"Valor Restos A Pagar Inscritos";"Valor Restos A Pagar Cancelados";"Valor Restos A Pagar Pagos"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"POMBAL - PB";"15";"Urbanismo";"451";"infra-estrutura urbana";"150000,00";"0,00";"0,00";"0,00";"146950,00";"3050,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"CAIÇARA - RS";"10";"Saúde";"302";"Assistência hospitalar e ambulatorial";"250000,00";"0,00";"0,00";"0,00";"0,00";"250000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"PIAUÍ (UF)";"10";"Saúde";"572";"Desenvolvimento tecnológico e engenharia";"896692,97";"0,00";"0,00";"0,00";"0,00";"896692,97"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"SANTANA DE PARNAÍBA - SP";"15";"Urbanismo";"451";"infra-estrutura urbana";"500000,00";"0,00";"0,00";"0,00";"493100,00";"6900,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"PALMITINHO - RS";"10";"Saúde";"302";"Assistência hospitalar e ambulatorial";"250000,00";"0,00";"0,00";"0,00";"0,00";"250000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"JANDIRA - SP";"10";"Saúde";"302";"Assistência hospitalar e ambulatorial";"1199996,15";"0,00";"0,00";"0,00";"499996,15";"700000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"FUNILÂNDIA - MG";"20";"Agricultura";"608";"Promoção da produção agropecuária";"398737,97";"0,00";"0,00";"0,00";"10000,00";"388737,97"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"VARGEM GRANDE PAULISTA - SP";"15";"Urbanismo";"451";"infra-estrutura urbana";"250000,00";"0,00";"0,00";"0,00";"0,00";"250000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"RESENDE - RJ";"27";"Desporto e lazer";"812";"Desporto comunitário";"692500,00";"0,00";"0,00";"0,00";"285187,50";"407312,50"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"BURITIRAMA - BA";"27";"Desporto e lazer";"812";"Desporto comunitário";"400000,00";"0,00";"0,00";"0,00";"0,00";"400000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"JARDIM - MS";"10";"Saúde";"302";"Assistência hospitalar e ambulatorial";"3380000,00";"0,00";"0,00";"0,00";"2600000,00";"780000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"LUZ - MG";"10";"Saúde";"302";"Assistência hospitalar e ambulatorial";"250000,00";"0,00";"0,00";"0,00";"0,00";"250000,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"MATOZINHOS - MG";"10";"Saúde";"301";"Atenção básica";"199890,00";"0,00";"0,00";"0,00";"0,00";"199890,00"
"Sem informaç";"2014";"Emenda Individual - Transferências";"S/I";"Sem informação";"S/I";"MOCOCA - SP";"10";"Saúde";"301";"Atenção básica";"498940,80";"0,00";"0,00";"0,00";"0,00";"498940,80"
We can notice that the fields on the file are separated by semicolons. Its encoding is ISO-8859. To make things even more complex, the monetary values are in Brazilian format, with comma as the separator between integer and decimal parts.
Reading and Preprocessing
To read it into a dataframe in Python (Rossum and Jr. 2001) and Pandas (McKinney 2012) we first need to import the libraries:
import pandas as pd
import plotly.express as px
import jsonNow we can read the file into a dataframe, using the correct encoding and field separators:
file = 'Resources/Data/Emendas/Emendas.csv'
# Read the CSV file using the specified separator and encoding.
df = pd.read_csv(file, sep=';', encoding='ISO-8859-1')The next step is to convert the field Valor Empenhado (the one that interest us more) to numeric, fixing the decimals’ separator – just replace some characters and convert it to float.
df['Valor'] = df['Valor Empenhado'].str.replace('.', '', regex=False).\
str.replace(',', '.').astype(float)Some of the fields on this dataframe are redundant, being numeric codes for textual descriptions. Wen can drop those. Just to be on the safe side let’s also keep only the amendment’s author code but not his/her name.
# We don't really need some of the fields (redundant ones):
df = df.drop(['Nome do Autor da Emenda', 'Número da emenda',
'Código Função','Código Subfunção'], axis=1)Of all the columns with values’ information, the one that interest us is the Valor, which was converted from Valor Empenhado (value that was committed for that amendment). Let’s keep that one and drop the others.
# We don't really need some of the payment fields (only 'Valor'):
df = df.drop(['Valor Empenhado', 'Valor Liquidado', 'Valor Pago',
'Valor Restos A Pagar Inscritos','Valor Restos A Pagar Cancelados',
'Valor Restos A Pagar Pagos'], axis=1) Let’s take a look at the data:
# Display part of the data.
df| Código da Emenda | Ano da Emenda | Tipo de Emenda | Código do Autor da Emenda | Localidade do gasto | Nome Função | Nome Subfunção | Valor | |
|---|---|---|---|---|---|---|---|---|
| 0 | Sem informaç | 2014 | Emenda Individual - Transferências | S/I | PIAUÍ (UF) | Saúde | Desenvolvimento tecnológico e engenharia | 896692.97 |
| 1 | Sem informaç | 2014 | Emenda Individual - Transferências | S/I | CAIÇARA - RS | Saúde | Assistência hospitalar e ambulatorial | 250000.00 |
| 2 | Sem informaç | 2014 | Emenda Individual - Transferências | S/I | CRUZEIRO DO SUL - AC | Defesa nacional | Assistência comunitária | 1250000.00 |
| 3 | Sem informaç | 2014 | Emenda Individual - Transferências | S/I | MOCOCA - SP | Saúde | Atenção básica | 498940.80 |
| 4 | Sem informaç | 2014 | Emenda Individual - Transferências | S/I | FUNILÂNDIA - MG | Agricultura | Promoção da produção agropecuária | 398737.97 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 64584 | 202444840006 | 2024 | Emenda Individual - Transferências | 4484 | Nacional | Defesa nacional | Administração geral | 670490.00 |
| 64585 | 202444840009 | 2024 | Emenda Individual - Transferências | 4484 | Nacional | Defesa nacional | Defesa terrestre | 599982.26 |
| 64586 | 202444840011 | 2024 | Emenda Individual - Transferências | 4484 | Nacional | Defesa nacional | Defesa terrestre | 420000.00 |
| 64587 | 202444840012 | 2024 | Emenda Individual - Transferências | 4484 | RIO GRANDE DO SUL (UF) | Saúde | Atenção básica | 350000.00 |
| 64588 | 202444840013 | 2024 | Emenda Individual - Transferências | 4484 | RIO GRANDE DO SUL (UF) | Assistência social | Assistência comunitária | 3750000.00 |
64589 rows × 8 columns
We can see that there are Localidade do Gasto for the whole country, for abroad (Exterior), for regions, for states, for counties and in multiple locations (MÚLTIPLO). If we’re going to compare some amendments per location, it is better to split the data into those different levels:
dfCountry = df[df["Localidade do gasto"] == "Nacional"]
dfAbroad = df[df["Localidade do gasto"] == "Exterior"]
dfMultiple = df[df["Localidade do gasto"] == "MÚLTIPLO"]
regions = ["Norte","Centro-Oeste","Nordeste","Sudeste","Sul"]
dfRegions = df[df["Localidade do gasto"].isin(regions)]
dfStates = df[df["Localidade do gasto"].str.contains(r"\(\w{2}\)", regex=True)]
dfCounties = df[df["Localidade do gasto"].str.contains(" - ")]Let’s count how many records we have in each subset, also calculate the sum to see if we have any record not in a subset:
dfCount = len(df)
dfCountryCount = len(dfCountry)
dfAbroadCount = len(dfAbroad)
dfMultipleCount = len(dfMultiple)
dfRegionsCount = len(dfRegions)
dfStatesCount = len(dfStates)
dfCountiesCount = len(dfCounties)
total = dfCountryCount+dfAbroadCount+dfMultipleCount+\
dfRegionsCount+dfStatesCount+dfCountiesCount
print("Number of records in the original DataFrame:", dfCount)
print("Number of records in the 'Nacional' subset:", dfCountryCount)
print("Number of records in the 'Exterior' subset:", dfAbroadCount)
print("Number of records in the 'Multiple' subset:", dfMultipleCount)
print("Number of records in the 'Regions' subset:", dfRegionsCount)
print("Number of records in the 'State (UF)' subset:", dfStatesCount)
print("Number of records in the 'County - State' subset:", dfCountiesCount)
print("Total number of records in the subsets:",total)Number of records in the original DataFrame: 64589
Number of records in the 'Nacional' subset: 4604
Number of records in the 'Exterior' subset: 54
Number of records in the 'Multiple' subset: 607
Number of records in the 'Regions' subset: 332
Number of records in the 'State (UF)' subset: 41976
Number of records in the 'County - State' subset: 17016
Total number of records in the subsets: 64589
There are other ways to partition or subpartition the data, with or without overlapping between the subsets. Which ones you suggest, and why?
Instead of partitioning the dataframe into different categories we could annotate it, creating a new field that identified whether the amendment’s location is National, State, etc. This may help in other analysis and exercises.
Basic EDA
Let’s use different subsets of the data to illustrate different EDA concepts.
Functions and Subfunctions
Functions (Nome Função) and subfunctions (Nome Subfunção) are the categories of application of the amendments’ values. Let’s take a look at their combinations and frequency.
cfunctions = df.groupby(["Nome Função","Nome Subfunção"]).size().reset_index(name='Count')
cfunctions| Nome Função | Nome Subfunção | Count | |
|---|---|---|---|
| 0 | Administração | Administração geral | 26 |
| 1 | Administração | Comunicação social | 4 |
| 2 | Administração | Controle interno | 4 |
| 3 | Administração | Desenvolvimento científico | 1 |
| 4 | Administração | Direitos individuais, coletivos e difusos | 2 |
| ... | ... | ... | ... |
| 204 | Urbanismo | Outras transferências | 2 |
| 205 | Urbanismo | Planejamento e orçamento | 2 |
| 206 | Urbanismo | Serviços urbanos | 7 |
| 207 | Urbanismo | Transportes coletivos urbanos | 46 |
| 208 | Urbanismo | infra-estrutura urbana | 4090 |
209 rows × 3 columns
We can create sunburst charts to show the distribution of amendments by functions and subfunctions. For consistency we will create a colormap for the sunburst charts that ensure the use of the same color for the same function:
functionsU = cfunctions["Nome Função"].unique()
colors = px.colors.qualitative.Plotly
# Create a dictionary to map each "Nome Função" to a specific color
colormap = {function: colors[i % len(colors)] for i, function in enumerate(functionsU)}With the grouped data and colormap we can create the sunburst chart with the code below:
# Create a Sunburst chart
sunburstC = px.sunburst(cfunctions,
path=['Nome Função', 'Nome Subfunção'],
values='Count',
title='Number of Amendments by Function and Subfunction',
labels={'Nome Função': 'Function', 'Nome Subfunção': \
'Subfunction', 'Count': 'Number of Amendments'},
color='Nome Função',
color_discrete_map=colormap)
# Display the chart
sunburstC.show()Let’s do the same chart but considering the total amount of the amendments. All we need to do is create a different grouping of the data:
vfunctions = df.groupby(["Nome Função", "Nome Subfunção"])["Valor"].sum().reset_index(name='Amount')
vfunctions| Nome Função | Nome Subfunção | Amount | |
|---|---|---|---|
| 0 | Administração | Administração geral | 2.741300e+07 |
| 1 | Administração | Comunicação social | 7.571335e+05 |
| 2 | Administração | Controle interno | 9.034404e+06 |
| 3 | Administração | Desenvolvimento científico | 7.836601e+06 |
| 4 | Administração | Direitos individuais, coletivos e difusos | 1.955050e+06 |
| ... | ... | ... | ... |
| 204 | Urbanismo | Outras transferências | 6.500000e+05 |
| 205 | Urbanismo | Planejamento e orçamento | 2.629205e+06 |
| 206 | Urbanismo | Serviços urbanos | 6.343775e+06 |
| 207 | Urbanismo | Transportes coletivos urbanos | 5.076996e+08 |
| 208 | Urbanismo | infra-estrutura urbana | 1.762114e+10 |
209 rows × 3 columns
Then plot the sunburst chart (using the same colormap as the previous chart to make comparisons easier):
# Create a Sunburst chart
sunburstV = px.sunburst(vfunctions,
path=['Nome Função', 'Nome Subfunção'],
values='Amount',
title='Total Amount of Amendments by Function and Subfunction',
labels={'Nome Função': 'Function', 'Nome Subfunção': \
'Subfunction', 'Count': 'Number of Amendments'},
color='Nome Função',
color_discrete_map=colormap)
# Display the chart
sunburstV.show()Do a quick comparison between the two plots and find differences in the visibility of their slices.
Countrywide Amendments
How many types of Nome Função (main function) do we have for countrywide amendments?
functionCounts = dfCountry["Nome Função"].value_counts()
functionCountsNome Função
Saúde 1431
Defesa nacional 1035
Direitos da cidadania 318
Segurança pública 266
Agricultura 229
Cultura 193
Educação 157
Encargos especiais 146
Assistência social 121
Ciência e Tecnologia 107
Organização agrária 104
Desporto e lazer 98
Gestão ambiental 66
Trabalho 64
Urbanismo 58
Transporte 47
Múltiplo 41
Administração 28
Comércio e serviços 24
Comunicações 16
Previdência social 14
Relações exteriores 13
Indústria 11
Saneamento 7
Energia 6
Essencial à justiça 2
Habitação 2
Name: count, dtype: int64
Are all those functions spread evenly through the years? Let’s compare them with a plot. First we group the data by Nome Função and Ano da Emenda counting how many amendments we had in each group:
cpy = dfCountry.groupby(["Nome Função","Ano da Emenda"]).size().\
reset_index(name='Count')
cpy| Nome Função | Ano da Emenda | Count | |
|---|---|---|---|
| 0 | Administração | 2016 | 3 |
| 1 | Administração | 2017 | 6 |
| 2 | Administração | 2019 | 2 |
| 3 | Administração | 2020 | 9 |
| 4 | Administração | 2022 | 3 |
| ... | ... | ... | ... |
| 208 | Urbanismo | 2020 | 11 |
| 209 | Urbanismo | 2021 | 9 |
| 210 | Urbanismo | 2022 | 16 |
| 211 | Urbanismo | 2023 | 1 |
| 212 | Urbanismo | 2024 | 3 |
213 rows × 3 columns
Then we create a faceted line plot, with a facet for each Nome Função.
# Create a faceted line plot
fig = px.line(cpy,
x='Ano da Emenda',
y='Count',
color='Nome Função',
facet_col='Nome Função',
facet_col_wrap=3,
title='Amendments per Year per Function',
labels={'Ano da Emenda': 'Year', 'Count': 'Amendments',
'Nome Função': ''},
markers=True)
# Update layout for better spacing and readability
fig.update_layout(
height=1020,
showlegend=False,
margin=dict(t=80)
)
# Remove the automatic subplot titles
for annotation in fig['layout']['annotations']:
annotation['text'] = annotation['text'].split("=")[-1]
# Update x-axes to show ticks and labels for all subplots
years = cpy["Ano da Emenda"].unique()
years.sort()
fig.update_xaxes(showticklabels=True,tickvals=years,tickangle=90,
tickfont=dict(size=10))
fig.update_yaxes(title_font=dict(size=10))
# Display the plot
fig.show()Not all functions had amendments for each year in our data – some have gaps, while some happened only during a short period.
The plot above counted the number of amendments. Let’s modify it so it shows the total value of the amendments, under the same conditions. First we group the data:
# Group by "Nome Função" and "Ano da Emenda" and sum the values
spy = dfCountry.groupby(["Nome Função", "Ano da Emenda"])["Valor"].sum().\
reset_index(name='Total Value')
spy| Nome Função | Ano da Emenda | Total Value | |
|---|---|---|---|
| 0 | Administração | 2016 | 2.144076e+07 |
| 1 | Administração | 2017 | 4.561856e+08 |
| 2 | Administração | 2019 | 2.229047e+07 |
| 3 | Administração | 2020 | 4.494842e+07 |
| 4 | Administração | 2022 | 2.306679e+06 |
| ... | ... | ... | ... |
| 208 | Urbanismo | 2020 | 1.779921e+09 |
| 209 | Urbanismo | 2021 | 2.782494e+09 |
| 210 | Urbanismo | 2022 | 4.040173e+08 |
| 211 | Urbanismo | 2023 | 3.000242e+09 |
| 212 | Urbanismo | 2024 | 5.061627e+08 |
213 rows × 3 columns
Then we do the same faceted plot:
# Create a faceted line plot
fig = px.line(spy,
x='Ano da Emenda',
y='Total Value',
color='Nome Função',
facet_col='Nome Função',
facet_col_wrap=3,
title='Amendments Total Value per Year per Function',
labels={'Ano da Emenda': 'Year', 'Count': 'Amendmts', 'Nome Função': ''},
markers=True)
# Update layout for better spacing and readability
fig.update_layout(
height=990,
showlegend=False,
margin=dict(t=80)
)
# Remove the automatic subplot titles
for annotation in fig['layout']['annotations']:
annotation['text'] = annotation['text'].split("=")[-1]
# Update x-axes to show ticks and labels for all subplots
years = spy["Ano da Emenda"].unique()
years.sort()
fig.update_xaxes(showticklabels=True,tickvals=years,tickangle=90,tickfont=dict(size=10))
fig.update_yaxes(title_font=dict(size=10))
# Display the plot
fig.show()Do a quick comparison between the two plots to find functions and years with a large value but small number of amendments.
Do the same faceted plot for your region, state and county. Point differences between the number of amendments and values.
Amendments per State
Let’s take a look at the amendments per state dataframe and some subsets of it. Just for fun let’s use choropleths – maps that represent variables.
We can use a simple 3-step process to create a choropleth: first get one GeoJSON file ready for our purposes, then prepare our data so it matches what is on the GeoJSON then create the plot itself.
A quick search on the web will show several sites with GeoJSON data that can be used for our project. Two are Moisés Lima’s brazil-states-geojson at Kaggle, and Giuliano Macedo geodata-br-states at Github. We’ll use this second one, which was downloaded and stored locally here.
Then we need to prepare the subset of our data that contains states’ amendments only, and also rename the Localidade do gasto field so it contains only the states’ name in uppercase:
dfStatesPrep = dfStates.groupby(["Localidade do gasto"]).size().reset_index(name='Count')
dfStatesPrep['Localidade do gasto'] = \
dfStatesPrep['Localidade do gasto'].str.replace(r'\ \(UF\)', '', regex=True)
dfStatesPrep| Localidade do gasto | Count | |
|---|---|---|
| 0 | ACRE | 813 |
| 1 | ALAGOAS | 784 |
| 2 | AMAPÁ | 690 |
| 3 | AMAZONAS | 602 |
| 4 | BAHIA | 2924 |
| 5 | CEARÁ | 1452 |
| 6 | DISTRITO FEDERAL | 961 |
| 7 | ESPÍRITO SANTO | 1228 |
| 8 | GOIÁS | 1690 |
| 9 | MARANHÃO | 1006 |
| 10 | MATO GROSSO | 610 |
| 11 | MATO GROSSO DO SUL | 830 |
| 12 | MINAS GERAIS | 4413 |
| 13 | PARANÁ | 2487 |
| 14 | PARAÍBA | 1153 |
| 15 | PARÁ | 1246 |
| 16 | PERNAMBUCO | 2063 |
| 17 | PIAUÍ | 780 |
| 18 | RIO DE JANEIRO | 2932 |
| 19 | RIO GRANDE DO NORTE | 1176 |
| 20 | RIO GRANDE DO SUL | 2801 |
| 21 | RONDÔNIA | 595 |
| 22 | RORAIMA | 531 |
| 23 | SANTA CATARINA | 1437 |
| 24 | SERGIPE | 827 |
| 25 | SÃO PAULO | 5216 |
| 26 | TOCANTINS | 729 |
The GeoJSON file contains coordinates for the shapes of the object it contains and also some properties for each object. In our case we have a property named Estado that contains the state’s names. We will load the GeoJSON file and change the values of the Estado property so these will also be in uppercase:
with open("Resources/Data/Emendas/br_states.json") as f:
geojson_data = json.load(f)
for feature in geojson_data['features']:
feature['properties']['Estado'] = feature['properties']['Estado'].upper()Now that we prepared the data and the GeoJSON file we can create the choropleth with Plotly with this code:
fig = px.choropleth(
dfStatesPrep, # The dataframe
geojson=geojson_data, # The GeoJSON data
# These two parameters identify the field on the dataframe and
# property on the GeoJSON data that must match.
locations='Localidade do gasto',
featureidkey="properties.Estado",
color='Count',
hover_name='Localidade do gasto',
hover_data=['Count'],
title='Number of Amendments per State',
color_continuous_scale="Viridis"
)
# Update layout for better spacing and readability
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":35,"l":0,"b":0})
# Display the map
fig.show()Do a similar choropleth for the total amount of amendments per state.
Do a similar choropleth for a subset of the States’ data for a specific function (get a popular one!), showing either the number of amendments or total amount.
Find a GeoJSON file for your state, and create a choropleth for all counties in your state with the total amount of amendments for a specific function.
Get the population of states (and regions) from Wikipedia: Lista de unidades federativas do Brasil por população or Federative units of Brazil. Calculate and plot the number of amendments and total value per state considering the population. Which charts can help visualize and compare this data?
Amendments per Functions and Years
Let’s pick up some interesting functions (Defesa nacional, Educação and Saúde) and compare the total value of the amendments they received per year. We’ll consider amendments that were destined to the whole country.
dfDefense = dfCountry[dfCountry["Nome Função"] == "Defesa nacional"]
dfDefenseY = dfDefense.groupby("Ano da Emenda")["Valor"].sum().reset_index()
dfDefenseY['Type'] = 'Defesa Nacional'
dfEducation = dfCountry[dfCountry["Nome Função"] == "Educação"]
dfEducationY = dfEducation.groupby("Ano da Emenda")["Valor"].sum().reset_index()
dfEducationY['Type'] = 'Educação'
dfHealth = dfCountry[dfCountry["Nome Função"] == "Saúde"]
dfHealthY = dfHealth.groupby("Ano da Emenda")["Valor"].sum().reset_index()
dfHealthY['Type'] = 'Saúde'
# Combine them
bothDfs = pd.concat([dfDefenseY,dfEducationY,dfHealthY])Then create the chart:
fig = px.line(bothDfs, x="Ano da Emenda", y="Valor", color='Type',
title="Total Amount of Amendments per Year",
labels={"Ano da Emenda": "Year", "Count": "Number of Amendments",\
"Type": "Type"})
fig.show()How would the information on this chart compare to amendments destined for states or regions?
Final Remarks
Most of the analysis in this section were done considering that the data file for the amendments is all we have, therefore, some of the results were interesting but inconclusive – much more data, from other sources, would be required for real investigative work.
For example, we may see patterns or outliers in some charts, and maybe some can be explained with common sense (e.g. an increased spending on health around 2020) but explanation of some (e.g. an increased spending on education, also around 2020) would require more context or external data.
There is more data on the site Consulta de Emendas Parlamentares | Portal da Transparência do Governo Federal, and it is possible to filter for more fields and download part of the data. For example, it is possible to search for amendments related with budgetary actions of the government, such as “005O - FOMENTO A PESQUISA CIENTIFICA, INOVACAO E EXTENSAO TECNOLOGICA NA AMAZONIA LEGAL” that would relate to all budget for research, innovation and outreach on the Amazon region, but there aren’t any amendments for that. Nonetheless interested readers should explore that site to see what other data can be obtained.
There are many other interesting information sources that could be used to complement the ones presented in this section, that may or may not be easy to find and use. Some ideas are:
- Find information on IDH (Human Development Index) for counties and states in different years and compare the IDH before and after application of the amendments’ resources.
- Find population information (preferrably stratified) and find patterns and outliers of application of amendments for counties that have similar population profiles.
- Get information on the political parties of the amendments’ authors for grouping and analysis accordingly to parties.